Understanding Chinese Texts via Statistical Inference

Ke Deng/邓柯 (Center for Statistical Science, Tsinghua University)

27-Dec-2020, 08:30-09:15 (5 years ago)

Abstract: With the growing availability of digitized text data both publicly and privately, there is a great need for effective computational tools to automatically extract information from texts. Because the Chinese language differs most significantly from alphabet-based languages in not specifying word boundaries, most existing Chinese text-mining methods require a prespecified vocabulary and/or a large relevant training corpus, which may not be available in some applications. We proposed a family of statistical approaches that can achieve multiple NLP tasks, such as word discovery, name entity recognition, word segementation, semamtic understanding and relation extraction, simultaneously with little training information. These approaches are particularly useful for mining domain-specific texts where the underlying vocabulary is unknown and/or the texts of interest differ significantly from standard training corpora.

Mathematics

Audience: researchers in the topic


ICCM 2020

Organizers: Shing Tung Yau, Shiu-Yuen Cheng, Sen Hu*, Mu-Tao Wang
*contact for this listing

Export talk to